Option panel : Spider
- Accept cookies
Accept cookies generated by the remote server
If you do not accept cookies, some "session-generated" pages will not be retrieved
- Check document type
Define when the engine has to check document type
The engine must know the document type, to rewrite the file types. For example, if a link called /cgi-bin/gen_image.cgi generates a gif image, the generated file will not be called "gen_image.cgi" but "gen_image.gif"
Avoid "never", because the local mirror could be bogus
- Parse java files
Must the engine parse .java files (java classes) to seek included filenames?
It is checked by default
- Spider
Must the engine follow remote robots.txt rules when they exist?
The default is "follow"
- Update hack
Attempt to limit transfers by wrapping known bogus responses from servers.
For example, pages with same size will be considered as "up to date", even if the timestamp seems
different. This can be useful for many dynamically generated pages, but this can also cause
not-updated pages in rare cases.
- Tolerant requests
Tolerate wrong file size, and make requests compliant with old servers
It is unchecked by default, because this option can cause files to become bogus
- Force old HTTP/1.0 requests
This option forces the engine to use HTTP/1.0 requests, and avoid HEAD requests.
Useful for some sites with old server versions, or with many dynamically generated pages.
Back to Home
|
|
|